AITopics | small network

Collaborating Authors

small network

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Neural Information Processing SystemsDec-24-2025, 22:33:09 GMT

A current remarkable improvement of unsupervised visual representation learning is based on heavy networks with large-batch training. While recent methods have greatly reduced the gap between supervised and unsupervised performance of deep models such as ResNet-50, this development has been relatively limited for small models. In this work, we propose a novel unsupervised learning framework for small networks that combines deep self-supervised representation learning and knowledge distillation within one-phase training. In particular, a teacher model is trained to produce consistent cluster assignments between different views of the same image. Simultaneously, a student model is encouraged to mimic the prediction of on-the-fly self-supervised teacher. For effective knowledge transfer, we adopt the idea of domain classifier so that student training is guided by discriminative features invariant to the representational space shift between teacher and student. We also introduce a network driven multi-view generation paradigm to capture rich feature information contained in the network itself. Extensive experiments show that our student models surpass state-of-the-art offline distilled networks even from stronger self-supervised teachers as well as top-performing self-supervised models. Notably, our ResNet-18, trained with ResNet-50 teacher, achieves 68.3% ImageNet Top-1 accuracy on frozen feature linear evaluation, which is only 1.5% below the supervised baseline.

name change, small network, unsupervised representation transfer, (2 more...)

Neural Information Processing Systems

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.76)

Add feedback

Growing with Experience: Growing Neural Networks in Deep Reinforcement Learning

Fehring, Lukas, Lindauer, Marius, Eimer, Theresa

arXiv.org Artificial IntelligenceJun-16-2025

While increasingly large models have revolutionized much of the machine learning landscape, training even mid-sized networks for Reinforcement Learning (RL) is still proving to be a struggle. This, however, severely limits the complexity of policies we are able to learn. To enable increased network capacity while maintaining network trainability, we propose GrowNN, a simple yet effective method that utilizes progressive network growth during training. We start training a small network to learn an initial policy. Then we add layers without changing the encoded function. Subsequent updates can utilize the added layers to learn a more expressive policy, adding capacity as the policy's complexity increases. GrowNN can be seamlessly integrated into most existing RL agents. Our experiments on MiniHack and Mujoco show improved agent performance, with incrementally GrowNN-deeper networks outperforming their respective static counterparts of the same size by up to 48% on MiniHack Room and 72% on Ant.

machine learning, proc, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2506.11706

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Eau De $Q$-Network: Adaptive Distillation of Neural Networks in Deep Reinforcement Learning

Vincent, Théo, Faust, Tim, Tripathi, Yogesh, Peters, Jan, D'Eramo, Carlo

arXiv.org Artificial IntelligenceMar-3-2025

Recent works have successfully demonstrated that sparse deep reinforcement learning agents can be competitive against their dense counterparts. This opens up opportunities for reinforcement learning applications in fields where inference time and memory requirements are cost-sensitive or limited by hardware. Until now, dense-to-sparse methods have relied on hand-designed sparsity schedules that are not synchronized with the agent's learning pace. Crucially, the final sparsity level is chosen as a hyperparameter, which requires careful tuning as setting it too high might lead to poor performances. In this work, we address these shortcomings by crafting a dense-to-sparse algorithm that we name Eau De $Q$-Network (EauDeQN). To increase sparsity at the agent's learning pace, we consider multiple online networks with different sparsity levels, where each online network is trained from a shared target network. At each target update, the online network with the smallest loss is chosen as the next target network, while the other networks are replaced by a pruned version of the chosen network. We evaluate the proposed approach on the Atari $2600$ benchmark and the MuJoCo physics simulator, showing that EauDeQN reaches high sparsity levels while keeping performances high.

final sparsity level, online network, sparsity level, (14 more...)

arXiv.org Artificial Intelligence

2503.01437

Country:

North America > Canada > Alberta (0.14)
Europe > Germany > Hesse > Darmstadt Region > Darmstadt (0.04)
Europe > Germany > Bavaria > Middle Franconia > Nuremberg (0.04)
Europe > Germany > Bavaria > Lower Franconia > Würzburg (0.04)

Genre: Research Report (0.50)

Industry:

Education (0.46)
Leisure & Entertainment (0.30)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Unsupervised Representation Transfer for Small Networks: I Believe I Can Distill On-the-Fly

Neural Information Processing SystemsJan-19-2025, 06:03:26 GMT

distill on-the-fly, small network, unsupervised representation transfer

Neural Information Processing Systems

Industry: Education (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.80)

Add feedback

Parallelizing neural networks on one GPU with JAX

#artificialintelligenceFeb-19-2021, 22:46:25 GMT

Most neural network libraries these days give amazing computational performance for training large neural networks. But small networks, which aren't big enough to usefully "fill" a GPU, leave a lot of available compute unused. Running a small network on a GPU is a bit like buying an apartment building and then living in the janitor's closet. In this article, I describe how to get your money's worth by training dozens of networks at once. As you follow along, we'll efficiently train dozens of small neural networks in parallel on a single GPU using the vmap function from JAX. Whether you are training ensembles, sweeping over hyperparameters, or averaging across random seeds, this technique can give you a 10x-100x improvement in computation time. If you haven't tried JAX yet, this may give you a reason to.

batch, dataset, neural network, (14 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Surprisal-Triggered Conditional Computation with Neural Networks

Lugosch, Loren, Nowrouzezahrai, Derek, Meyer, Brett H.

arXiv.org Machine LearningJun-2-2020

Autoregressive neural network models have been used successfully for sequence generation, feature extraction, and hypothesis scoring. This paper presents yet another use for these models: allocating more computation to more difficult inputs. In our model, an autoregressive model is used both to extract features and to predict observations in a stream of input observations. The surprisal of the input, measured as the negative log-likelihood of the current observation according to the autoregressive model, is used as a measure of input difficulty. This in turn determines whether a small, fast network, or a big, slow network, is used. Experiments on two speech recognition tasks show that our model can match the performance of a baseline in which the big network is always used with 15% fewer FLOPs.

autoregressive model, neural network, surprisal, (15 more...)

arXiv.org Machine Learning

2006.01659

Country:

North America > Canada > Quebec > Montreal (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

Hyperparameter Optimization: A Spectral Approach

Hazan, Elad, Klivans, Adam, Yuan, Yang

arXiv.org Artificial IntelligenceJan-19-2018

We give a simple, fast algorithm for hyperparameter optimization inspired by techniques from the analysis of Boolean functions. We focus on the high-dimensional regime where the canonical example is training a neural network with a large number of hyperparameters. The algorithm --- an iterative application of compressed sensing techniques for orthogonal polynomials --- requires only uniform sampling of the hyperparameters and is thus easily parallelizable. Experiments for training deep neural networks on Cifar-10 show that compared to state-of-the-art tools (e.g., Hyperband and Spearmint), our algorithm finds significantly improved solutions, in some cases better than what is attainable by hand-tuning. In terms of overall running time (i.e., time required to sample various settings of hyperparameters plus additional computation time), we are at least an order of magnitude faster than Hyperband and Bayesian Optimization. We also outperform Random Search 8x. Additionally, our method comes with provable guarantees and yields the first improvements on the sample complexity of learning decision trees in over two decades. In particular, we obtain the first quasi-polynomial time algorithm for learning noisy decision trees with polynomial sample complexity.

algorithm, artificial intelligence, machine learning, (20 more...)

arXiv.org Artificial Intelligence

1706.00764

Country:

North America > United States (1.00)
Europe (1.00)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.88)

Add feedback

How to debug neural networks. Manual. – Hacker Noon

#artificialintelligenceAug-23-2017, 13:55:42 GMT

Debugging neural networks can be a tough job even for field expert. Millions of parameters stuck together where even one small change can break all your hard work. Without debugging and visualization all your actions is popping a coin, and what worse it eating your time. Here i gather practices that will help you find problems earlier. Try to overfit your model with small dataset General you neural net should overfit your data in a few hundreds of iterations.

artificial intelligence, machine learning, neural network, (15 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback